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Abstract  -  Joint  data  management  (JDM)  includes  the 
hardware  (e.g.  sensors/targets),  software  (e.g. 
processing/algorithms),  and  operations  (environments)  of 
data  exchange  that  enable  persistent  surveillance  in  the 
context  of  a  data-to-decision  (D2D)  information  fusion 
enterprise.  Key  attributes  of  an  information  system  require 
pragmatic  assessment  of  data  and  information 
management,  distributed  communications,  knowledge 
representation,  human-systems  interaction,  as  well  as  a 
balanced  sensor  mix,  algorithm  choice,  and  life-cycle  data 
management.  Throughout  the  paper,  we  seek  to  describe 
the  current  technology,  research  approaches,  and  metrics 
that  influence  a  realizable  JDM  product.  We  develop  JDM 
methods  for  structured  and  unstructured  data  to  determine 
an  accurate  target  track  and  identification  as  a  moving 
intelligence  (MOVINT)  capability.  We  examine 
classification  methods  of  unstructured  data  using  seismic, 
acoustic,  and  combined  fusion  methods  for  data  analysis 
and  information  management. 

Keywords:  Information  Fusion,  MOVINT,  data 

management,  unstructured  data,  target  tracking 

1  Introduction 

The  goal  of  Joint  Data  Management  (JDM)  for  MOVINT 
(intelligence  about  a  moving  object)  is  to  design  a  tool  that 
supports  sensor  placement,  optimal  data  collection,  and 
active  sensor  management  for  decision  support,  in  an 
environment  where  data  exchange  is  seamless,  efficient, 
and  appropriate  across  potentially  diverse  stakeholders. 
With  limited  sensor  resources,  there  is  a  need  to  optimize 
sensor  use  that  maximizes  the  sensor  utility  for  users  to 
observe  moving  targets  [1].  The  utility  is  based  on  the 
measures  of  effectiveness,  which  can  vary  over  the  targets 
of  interest,  sensor  types,  environmental  conditions, 
situational  context,  and  users  [2]. 

MOVINT  is  an  intelligence  gathering  method  by  which 
images  (IMINT),  non-imaging  products  (MASINT),  and 
signals  (SIGINT)  produce  a  movement  history  of  objects 
of  interest.  MOVINT  provides  both  tactical  and 
operational  intelligence  (situational  awareness)  of  the 
dynamic  environment.  One  example  of  MOVINT  is 
detecting  objects  moving  in  an  urban  area  [3].  Detecting 
objects  can  be  completed  by  fixed  ground  cameras  or  on 
dynamic  unmanned  aerial  vehicles  (UAVS).  If  the  sensors 
are  on  UAVS,  path  planning  is  needed  to  route  the  UAVs 


to  observe  the  cars  [4,  5]  and  cooperation  among  UAVs  is 
necessary  [6].  The  Defense  Advanced  Research  Projects 
Agency  (DARPA)  Grand  Challenge  featured  sensors  on 
mobile  unattended  ground  vehicles  (UGVs)  observing  the 
environment  [7].  Mobile  sensing  can  be  used  to  orient  [8] 
or  conduct  simultaneous  location  and  mapping  (SLAM) 
[9]  to  observe  the  environment  or  targets  [10]. 

A  significant  challenge  in  detecting  and  tracking 
moving  vehicles  in  an  urban  area  over  a  long  period  of 
time  is  to  acquire  data  in  a  persistent,  pervasive,  and  an 
occlusion  compensating  manner  [11].  There  has  been  a 
recent  surge  in  the  design  and  deployment  of  wide  field- 
of-view  systems  known  as  WAMI  (wide  area  motion 
imagery)  sensors,  including  the  DARPA  Autonomous 
Real-Time  Ground  Ubiquitous  Surveillance  Imaging 
System  (ARGUS-IS).  At  any  given  instance,  they  produce 
images  with  dramatically  varying  point  spread  functions 
across  a  very  large  field  of  view;  and,  any  given  location 
undergoes  persistent  observation  of  varying  spatial  fidelity 
from  different  viewing  directions  as  the  sensor  moves 
steadily  in  a  fixed  pattern  above  the  city  [11,  12].  A 
substantial  amount  of  preprocessing,  coupled  with  frame- 
to-frame,  or  frame-to-DTED  (digital  terrain  elevation 
data)  registration  is  applied  before  an  image  sequence  can 
be  analyzed  in  the  context  of  multi-target  tracking,  or 
historical  baseline  similar  to  UAV  video  analysis  or  object 
deformation  measurement  tasks  [13,  14].  Detection, 
feature  extraction,  post-processing,  object  detection, 
tracking,  and  track- stitching  of  moving  vehicles  in  these 
videos  is  still  a  complex  problem  in  terms  of,  fusion, 
computation  and  throughput  [13,  15].  Motion  detection- 
based  track  initialization  for  vehicle  and  people  tracking 
using  the  flux  tensor,  aligned  motion  history  images,  and 
related  approaches  have  been  shown  to  be  versatile 
approaches  [12,  16,  17,  18].  Scaling  these  algorithms  to 
very  large  WAMI  sequences  will  require  improved 
computer  vision  algorithms  and  multicore  parallelization 
[15,  19].  Joint  data  management,  summarization,  and 
retrieval  using  content-based  querying  and  searching  of 
visual  information  with  user  feedback  remain  a 
significantly  challenging  area  [20,  21] 

Deployed  ground  sensors  can  observe  the  targets; 
however  they  are  subject  to  the  quality  of  the  sensor 
measurements  as  a  well  as  obscurations.  One  interesting 
question  is  how  to  deploy  the  fixed  sensors  that  optimize 
the  performance  of  a  system.  Efforts  in  distributed 
wireless  networks  (WSNs)  [22]  have  resulted  in  many 
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including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 
VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  OMB  control  number. 


1.  REPORT  DATE 

JUL  2011 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-00-2011  to  00-00-2011 

4.  TITLE  AND  SUBTITLE 

Joint  Data  Management  for  MOVINT  Data-to-Decision  Making 

5a.  CONTRACT  NUMBER 

5b.  GRANT  NUMBER 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Naval  Research  Laboratory, Washington, DC, 20375 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS (ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 


12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

Presented  at  the  14th  International  Conference  on  Information  Fusion  held  in  Chicago,  IL  on  5-8  July 
2011.  Sponsored  in  part  by  Office  of  Naval  Research  and  U.S.  Army  Research  Laboratory. 

14.  ABSTRACT 

Joint  data  management  (JDM)  includes  the  hardware  (e.g.  sensors/targets),  software  (e.g. 
processing/algorithms),  and  operations  (environments)  of  data  exchange  that  enable  persistent  surveillance 
in  the  context  of  a  data-to-decision  (D2D)  information  fusion  enterprise.  Key  attributes  of  an  information 
system  require  pragmatic  assessment  of  data  and  information  management,  distributed  communications, 
knowledge  representation,  human-systems  interaction,  as  well  as  a  balanced  sensor  mix,  algorithm  choice, 
and  life-cycle  data  management.  Throughout  the  paper,  we  seek  to  describe  the  current  technology, 
research  approaches,  and  metrics  that  influence  a  realizable  JDM  product.  We  develop  JDM  methods  for 
structured  and  unstructured  data  to  determine  an  accurate  target  track  and  identification  as  a  moving 
intelligence  (MO VINT)  capability.  We  examine  classification  methods  of  unstructured  data  using  seismic 
acoustic,  and  combined  fusion  methods  for  data  analysis  and  information  management. 


15.  SUBJECT  TERMS 


16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

Same  as 
Report  (SAR) 

8 

Standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


issues  in  distributed  processing,  communications,  and  data 
fusion  [23].  To  facilitate  both  WSNs  decision  support, 
requires  efforts  in  understanding  the  user’s  needs  [24],  the 
theoretical  and  knowledge  models  [25],  and  situational 
awareness  processing  techniques  [26].  In  a  dynamic 
scenario,  resource  coordination  [27]  is  needed  for  both 
context  assessment,  but  also  the  ability  to  be  aware  of 
impending  situational  threats  [28]. 

For  distributed  sensing  systems,  to  combine  sensors, 
data,  and  user  analysis  requires  pragmatic  approaches  to 
metrics  [26,  29,  30,  31].  For  example,  Blasch  developed 
fusion  Quality  of  Service  (QoS)  metrics  [26],  Zahedi  and 
Bisdikian  [32]  develop  a  Quality  of  Information  (Qol) 
architecture  for  comparison  of  centralized  versus 
distributed  sensor  network  deployment  planning,  and 
Bisdikian,  et.  al.  [33]  propose  Value  of  Information  (Vol) 
metrics  that  can  be  useful  in  D2D  evaluation. 

Information  fusion  has  been  interested  in  database 
problems  for  target  trafficability  (i.e.  terrain  information) 
[34],  sensor  management  [35],  and  processing  algorithms 
[36]  from  which  to  assess  objects  in  the  environment. 
Various  techniques  have  incorporated  grouping  object 
movements  [37],  road  information  [38,  39],  and  updating 
the  object  states  based  on  environmental  constraints  [40]. 
Detecting,  classifying,  identifying  and  tracking  objects 
[41]  has  been  important  for  a  variety  of  sensors,  including 
2D  visual,  radar  [42],  and  hyperspectral  [43]  data; 
however  newer  methods  are  of  interest  to  ground  sensors 
with  ID  signals. 

The  DARPA  Sensor  Information  Technology  (SensIT) 
program  investigated  deploying  a  distributed  set  of 
wireless  sensors  along  a  road  to  classify  vehicles  as  shown 
in  Figure  1.  Given  the  deployed  set  of  sensors,  feature 
vectors  were  used  to  classify  signals  based  on  the  data 
from  the  seismic  and  acoustic  signals.  [44]  Various 
approaches  include  combining  the  data  with  decision 
fusion  [45],  value  fusion  [46],  and  simultaneous  track  and 
identification  (ID)  methods  [47].  Information  theoretical 
approaches  including  the  Kullback-Leibler  method  were 
applied  to  the  data  for  sensor  management  [48]. 


Figure  1.  SensIT  Data  from  [m.  f.  Duarte  and  Y.  H.  Hu, 
“Vehicle  Classification  in  Distributed  Sensor  Networks,”  2004.  [44] 

Much  work  has  been  completed  using  imaging  sensors 
and  radar  sensors  for  observing  and  tracking  targets. 
Video  sensors  are  limited  in  power  and  subject  to 
day/night  conditions.  Likewise,  radar  line-of  site  precludes 


them  from  observing  in  the  same  plane.  Together,  both 
imaging  and  radar  sensors  do  not  have  the  advantage  of 
unattended  ground  sensors  (UGS)  which  can  power  on 
and  off,  can  work  for  a  long  time  on  battery  power,  and 
can  be  deployed  to  remote  areas. 

Track  management  situational  awareness  tools  receive 
input  from  sensor  feeds  (examples  include  electro-optical, 
radar,  electronic  support  measures  (ESMs),  and  sonar)  and 
display  this  information  to  a  user.  User  inputs  include: 
creation  of  new  objects,  such  as  tracks,  contacts  and 
targets.  Methods  to  reduce  data-to-decisions  (D2D) 
include:  fusing  multiple  tracks  into  a  single  track, 
incorporating  alerting  mechanisms,  or  visualizing  track 
data  common  operational  picture  (COP).  Sensor  and  track 
data  can  grow  rapidly  as  the  user  desires  to  keep  historical 
data.  Wikipedia  states  that  the  use  of  relational  database 
management  systems  (RDBMS)  [49]  provide  support  for 
track  management;  however,  RDBMS  requires  a  high 
level  of  maintenance,  provides  limited  support  for  ad-hoc 
querying,  involves  rigid  storage  paradigms,  and  has 
scalability  issues. 

Our  goal  is  to  determine  the  possible  JDM  for  D2D 
from  the  unstructured  data  to  the  classification  decision 
over  varying  environmental  conditions.  JDM  includes  (1) 
sensor  management  and  placement  of  these  UGSs,  (2) 
intelligent  use  of  the  data  based  on  value  for  classification, 
(3)  coordination  of  sensor  data  for  detection, 
classification,  or  both,  and  (4)  metrics  to  support  the 
sensor  and  data  management  as  supporting  a  user  control. 
Together,  these  factors  have  to  be  addressed  in  decision 
support  tools  that  aid  an  operational  team  that  deploys, 
maintains,  repairs,  and  then  utilizes  the  data  over  a 
distributed  network.  Section  2  discusses  classification, 
Section  3  details  unstructured  data,  and  Section  4  and  5 
present  an  example  with  conclusions. 

2  Target  Location  /  Classification 

We  desire  to  produce  a  JDM  system  for  D2D  with  a 
MO  VINT  capability,  which  introduces  the  question  -  what 
characteristics  are  relevant  for  such  a  system?  MO  VINT  is 
an  intelligence  gathering  method  by  which  images 
(IMINT),  non-imaging  products  (MASINT),  and  signals 
(SIGINT)  produce  a  movement  history  of  objects. 

The  goal  is  to  utilize  the  UGSs  sensors  which  may  be 
acoustic,  magnetic,  seismic,  and  PIRoelectric  (passive 
infrared)  for  motion  detection.  With  a  variety  of  sensors, 
information  fusion  of  JDM  for  D2D  can  (a)  utilize  the 
most  appropriate  sensor  at  the  correct  time,  (b)  combine 
information  from  both  sensors  on  a  single  platform,  (c) 
combine  results  from  multiple  platforms,  and  (d)  cue  other 
sensors  in  a  hand-off  fashion  to  effectively  monitor  the 
area.  Sensor  exploitation  requires  an  analysis  of  feature 
generation,  extraction,  and  selection  or  (construction, 
transformation,  selection,  and  evaluation).  To  provide 
track  and  ID  results,  we  develop  a  MO  VINT  capability  of 
the  target  location  and  identification. 
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Sensor  exploitation  includes  detection,  recognition, 
classification,  identification,  and  characterization  of 
objects.  Individual  classifiers  can  be  deployed  at  each 
level  to  robustly  determine  the  object  information. 
Popular  methods  include  voting,  neural  networks,  fuzzy 
logic,  neuro-dynamic  programming,  support  vector 
machines,  Bayesian  and  Dempster-Shafer  methods.  One 
way  to  ensure  the  accurate  assessment  is  to  look  at  a 
combination  of  classifiers.  [50]  Issues  in  classifier 
combination  methods  need  to  be  compared  as  related  to 
decisions,  feature  sets,  and  user  involvement.  Selecting 
the  optimal  feature  set  is  based  on  the  situation  and 
environmental  context  of  which  the  sensors  are  deployed. 
Typically,  a  mobile  sensor  needs  to  optimize  its  route  and 
can  be  subject  to  interactive  effects  of  pursuers  and 
evaders  with  other  targets  [51]  as  well  as  active  jamming 
of  the  signal  [52]. 

Detecting  targets  from  seismic  and  acoustic  data  in  a 
distributed  net  centric  fashion  requires  pragmatic 
approaches  to  sensor  and  data  management.  [53]  To 
robustly  track  and  ID  a  target  requires  both  the  structured 
data  from  the  kinematic  movements  as  well  as  the 
unstructured  data  for  the  feature  analysis.  [54] 

3  Unstructured  Data 

Because  effective  MO  VINT  must  incorporate  diverse  data 
structures,  it  is  important  that  a  JDM  system  address 
concerns  of  unstructured  data.  Unstructured  data  (versus) 
structured  data  refers  to  computerized  information  that 
does  not  have  a  data  structure  (i.e.  exist  within  a 
database).  Examples  of  “unstructured  data”  may  include 
(1)  textual :  documents,  presentations,  spreadsheets, 
scanned  images,  etc.,  (2)  imagery :  multimedia  files, 
streaming  video,  etc.,  (3)  HUMINT :  reports,  audio  files, 
and  gestures,  (4)  sensors :  seismic,  acoustic,  magnetic, 
sonar,  etc.,  and  (5)  environmental :  weather,  GIS,  etc.  All 
of  the  data  has  to  be  collected,  acquired,  exploited,  stored, 
recalled,  and  tagged,  not  to  mention  a  host  of  other 
activities.  Most  of  data  that  is  collected  has  some 
structure;  however,  for  information  fusion  the  inherent 
structure  is  not  common  among  entities. 


“80%  of  all  enterpri&e  data  is  unstructured, 11 

Figure  2.  Description  of  Unstructured  Data. 

Research  has  shown  that  over  95%  of  the  digital  universe 
is  unstructured  data.  According  to  these  studies,  80%  of 
all  stored  organizational  data  is  unstructured  [55,  56]. 
This  presents  a  critical  challenge  for  large  data 


technologies  specifically  in  the  area  of  data  exchange 
because  unstructured  data  must  be  structured  before 
knowledge  can  be  extracted  and  must  therefore  undergo 
some  sort  of  transformation.  The  impact  of  this 
transformation  affects  the  manner  in  which  the  data  is 
stored,  accessed,  and  utilized.  The  effects  of  the 
transformation  are  visible  in  the  metadata,  where  the 
information  contained  in  the  data  itself  is  described; 
illustrating  the  implications  of  data  exchange  on  data 
integration.  The  relationship  between  data  exchange  and 
data  integration  is  not  trivial  and  from  a  decision-making 
perspective  must  be  tightly  linked  together  because  the 
data  is  exchanged  for  a  purpose,  likely  with  other  data. 
When  characterized  in  this  manner,  the  performance  of 
data  exchange  has  an  implicit  dependency  on  integration 
and  therefore  schema  synthesis. 

Managing  data  requires  dealing  with  the  structured  and 
unstructured  data  with  methods  to  allow  the  user  and  the 
algorithm  to  understand  the  credibility  and  complexity  of 
the  data. 

3.1  Unstructured  Information  Challenge 

Exclusive  of  the  unstructured  or  structured  nature  of  data, 
the  premise  of  data  exchange  suggests  a  need  for  a 
unifying,  ideally  universal,  data  schema.  The  likelihood 
of  achieving  such  a  unified  schema  in  the  near  term, 
particularly  in  a  dynamic  and  diverse  environment  is 
unlikely.  However  that  does  not  preclude  the  research 
merit  in  attempting  to  achieve  such  an  objective;  rather  it 
underscores  the  importance  of  doing  so. 

The  unified  data  integration  model  for  situation 
management  developed  by  Yoakum-Stover  and  Malyuta 
[57]  presents  a  database-centric  theoretical  solution  for 
unified  storage  of  structured  data  that  is  viable  in  ultra- 
large  scale  systems  environments.  This  solution  is  based 
on  their  Data  Definition  Framework  (DDF).  The  DDF 
consists  of  six  primitives  (signs,  mentions,  terms  concepts, 
statements  and  predicates)  that  describe  the  fundamental 
elements  of  data  generically.  The  research  proposes  that 
these  primitives  can  be  utilized  as  a  lossless  foundational 
structure  with  which  to  decouple  vocabularies/data  models 
from  the  source  data  artifacts. 

While  the  objective  of  a  lossless  unifying  data  model 
that  allows  integration  of  disparate  data  sources  and  model 
semantics  is  laudable  as  well  as  desirable,  many  practical 
considerations  that  have  historically  characterized  data 
integration  and  fusion,  present  challenges  to  any  solution’s 
viability.  Exclusive  any  sociological,  behavioral,  or 
organizational  obstacles  to  unified  information  spaces, 
which  are  not  the  focus  of  the  research;  the  authors’ 
solution  takes  a  step  in  the  direction  of  addressing  the 
practical  technical  issues.  Despite  the  innovations  present 
in  the  DDF,  it  suffers  from  some  limitations  that  are 
particularly  critical  to  a  unified  model.  Most  significantly, 
the  linkages  between  the  data  and  the  model  prevent  the 
DDF  from  capturing  concepts  for  which  no  data  exists, 
which  is  essential  for  any  unifying  schema.  To  this  extent 
the  DDF  would  be  effectively  useless  in  cases  where 
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sparsity  was  high  or  in  cold  start  situations  such  as  those 
that  would  existing  in  ranking  or  recommendation 
decision  support  systems  [58].  Further,  the  DDF  also  lacks 
the  notion  of  element  ordering  or  implementation  to 
capture  constraints,  participation,  and  cardinality.  To 
effectively  utilize  such  an  approach  it  is  essential  to  extend 
the  work  of  Yoakum  et  al.  to  address  these  issues. 

The  DDF  is  only  one  notion  of  a  unifying  schema 
approach  and  there  are  others,  including  the  Extended 
Entity  Relationship  data  model  (EER)  [59],  the 
Amsterdam  Hypermedia  Model  [60],  the  object-oriented 
predicate  calculus  [61],  the  UCLA  M  Model  [62],  and  the 
iMeMex  Data  Model.  While  having  individual  benefits 
over  one  another  these  models  generally  tend  to  focus  on 
logical  schema  definition.  The  Amsterdam  Hypermedia 
Model  and  the  UCLA  M  Model  target  multimedia, 
timeline,  and  simulation  data  and  as  such  lack  broad 
generalizability  to  other  data  types.  EER  has  grown  in 
popularity  and  has  become  the  basis  for  contemporary 
relational  database  modeling  due  to  its  visual 
effectiveness,  but  lacks  the  rich  semantics  of  object 
oriented  or  other  modeling  constructs  and  is  bound  by  the 
limitations  in  scaling  of  entity-relational  structures. 

3.2  InfoGrid  NoSQL  and  Probe  Framework:  Data 
Exchange  in  a  Non-Relational  Schema 

Given  a  generalized  information  model,  there  must  exist 
and  architecture  that  can  support  joint  information 
management.  InfoGrid  [63]  is  an  open-source  software 
modular  architecture  that  is  comprised  of  a  graph  database 
that  abstracts  data  stores’  interface  to  web  applications. 
Figure  3  illustrates  the  high  level  architecture  of  InfoGrid. 
The  design  objectives  of  InfoGrid  were  to  support  a  broad 
set  of  information  types,  connect  information  from 
different  sources  with  an  integrated  application 
programmers’  interface  that  is  schema-driven  and  support 
a  broad  range  of  applications.  Within  the  InfoGrid 
structure,  information  is  modeled  as  a  semantic  network. 
The  design  of  InfoGrid  resolves  the  join-scalability  of 
relational  databases  and  separates  the  tight  integration 
between  the  data  and  the  application. 

InfoGrid  specifically  targets  web  applications.  The 
Probe  Framework,  which  is  built  on  the  InfoGrid  platform, 
makes  the  content  of  external  data  stores  and  sources 
appear  as  InfoGrid  objects  that  self-update.  The  Probe 
Framework  does  this  by  shadowing  the  content  of  external 
sources  as  they  change  through  the  implementation  of 
probes  that  monitor  and  control  updating  effectively 
creating  decentralized  data  sources  with  federated 
governance  within  the  scope  of  the  InfoGrid 
infrastructure.  From  this  perspective  probes  operate  like 
services  that  extend  the  external  data  source  into  the 
InfoGrid  platform  on  which  applications  are  layered. 

The  InfoGrid  architecture  has  many  benefits  for  data 
exchange  in  a  large  data  context.  It  subsumes  the 
challenges  of  unified  schemas  by  providing  both  a 
middleware  pass  through  (using  the  Probe  Framework)  as 
well  as  a  centralized  graph  database  (the  MeshBase 


referred  to  in  Figure  3])  on  which  applications  are  built. 
The  broad  range  of  data  stores  addresses  the  diverse 
nature  of  data  structure  and  incorporates  utilities  within 
the  framework  for  specialized  processing  tasks.  By 
adopting  this  architecture  InfoGrid  allows  scalable 
applications  to  be  created  and  maintained  more  quickly, 
more  reliably  and  at  lower  cost  by  addressing  the  concerns 
of  data  exchange.  Moreover,  the  generalized  architecture 
can  increase  the  availability  of  decision-related  resources 
and  therefore  increase  the  probability  of  successful 
decision  outcomes  [64]. 
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Figure  3.  InfoGrid  Application  Architecture  [63]. 

3.3  Data  Management  Processing 

Data  exchange  can  result  from  delivering  the  raw  data 
versus  publishing  data  summaries.  Delivering  the  raw  data 
requires  an  architecture  that  can  support  large  volumes  of 
data.  Another  method  is  to  design  the  architecture  such 
that  the  processing  is  embedded  in  the  sensor  to  enable: 
faster  data  delivery,  increased  speed  from  data  to  decisions 
(D2D),  and  quicker  ability  to  cue  other  sensors.  Sensor 
distance  and  data  amount  are  tradeoffs  that  must  be 
accommodated  for  processing  speeds  of  D2D.  Processing 
the  data  at  the  sensor  would  require  communication 
challenges  between  distributed  sensors.  For  both  cases,  the 
architecture  must  address  large  amounts  of  data  exchange 
and  the  speed  of  the  communication  for  data  exchange. 

There  are  many  techniques  for  processing  unstructured 
data  using  known  or  a  priori  hypothetical  situations. 
Since  the  data  is  unstructured  it  is  essential  to  provide 
some  context  around  which  exploitation  can  be  built. 
Approaches  include:  data  transformation,  analysis,  and 
sampling,  feature  generation,  association,  selection  and 
extraction;  and  decision  classification  such  as  Bayesian, 
Dempster-Shafer,  and  Support  Vector  Machines  (SVM) 
methods  for  clustering  and  association  rule  extractions. 
Using  the  above  methods,  either  known  models  or 
machine  learned  unknown  models  can  help  assess  the 
data. 
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Data  mining  supports  the  processing  of  data,  however, 
ontologies  (or  semantic  models)  can  improve  the 
categorization,  storage,  and  indexing  of  the  data.  An 
ontology  improves  communication  between  humans  and 
machines  because  an  ontology  contains  machine  - 
processable  structures  to  disambiguate  given  data  values 
as  well  as  data  structures. 

3.4  Published/Filtered  Data 

Processing  of  large  volumes  of  data  requires  metrics, 
architectural  models,  and  operational  realistic  scenarios  to 
test  data  search,  access,  and  dissemination.  Properly 
measuring  significant  parameters  is  critical  to  quantifying 
compliance  and  outcomes;  yet  doing  so  presents  a 
challenge  for  eliciting  quantifiable  data,  particularly  in  the 
case  of  architectural  or  system-related  measures. 
Assessment  of  large  data  architectures  requires  a  set  of 
metrics  that  will  objectively  quantify  performance  of  the 
architecture,  its  related  technologies,  and  process/decision 
impacting  outcomes.  Relative  to  the  JDM  emphasis  on 
large  data ,  it  is  important  to  revisit  a  working  definition  of 
large  data.  Large  data  is  when  data  has  sufficient  volume 
such  that  it  cannot  be  completely  processed  for  real-time 
decision  making.  Extending  the  definition  to  architectural 
metrics,  additional  focus  should  be  given  to  scope 
measurements  that  determine  the  tradeoffs  between  cost, 
timeliness,  throughput,  accuracy,  and  confidence.  The 
performance  of  a  large  data  architecture  (LDA),  like  any 
complex  system  is  affected  by  its  objectivity,  context,  and 
resolution  of  measurement.  As  a  system  increases  in  size, 
it  becomes  increasingly  difficult  to  identify  the 
complexity/flexibility/scalability  and  number  of  human 
participants  of  all  relevant  system  elements  or  even 
quantify  what  should  be  measured. 

There  are  two  general  perspectives  on  architectural 
metrics:  measurement  of  the  descriptive  architecture  itself 
and  the  measurement  of  the  architectural  artifacts.  There 
is  ample  work  detailing  the  measurement  of  artifacts,  but 
the  work  measuring  architectural  quality  is  somewhat 
sparse.  Yet  there  are  advantages  to  descriptive  architecture 
evaluations.  These  benefits  include  financial  benefits, 
increased  understanding  and  documentation  of  the  artifact, 
detection  of  problems  with  the  existing  architecture,  and 
clarification  and  prioritization  of  requirements  [65] 
Evaluating  a  descriptive  architecture  has  an  additional 
benefit  in  that  it  can  provide  the  foundation  for  system 
performance  assessment  before  the  system  is  developed. 

3.5  Data  Management  Metrics 

Data  exchange  is  an  important  area  of  information 
management  that  aims  at  understanding  and  developing 
foundations,  methods,  and  algorithms  for  transferring  data 
between  differently  structured  information  spaces  to  be 
used  for  diverse  purposes.  The  exchange  of  data  is  but 
one  critical  step  in  information  management.  However  the 
exchange  of  data  is  a  linchpin  for  the  success  of  any  data 
management  strategy  or  infrastructure.  Efficient  and 
effective  exchange  of  data  must  address  many  issues 


beyond  just  getting  the  data  to  where  it  is  needed 
(transport).  Issues  of  dissemination  (access,  availability, 
control),  quality  (truth,  relevance,  accuracy),  and 
timeliness  (speed-to-need  and  information  lifecycle)  are 
exemplar  list  of  challenges  in  data  exchange.  Similarly 
many  of  these  metrics  translate  directly  to  decision 
outcomes  (timeliness,  user  confidence,  and  accuracy). 
From  a  large  data  perspective,  the  process  of  data 
exchange  is  complicated  by  limitations  in  interoperability , 
diversity  in  applications  and  contexts,  and  even  by  the 
structure  of  the  data  itself. 

A  summary  [66]  of  ten  key  requirements  include: 

•  Visibility :  Illustration  such  as  folders  and  plots 

•  Control :  Test,  push,  and  pull  of  information 

•  Auditing :  Complete  and  searchable 

•  Security :  Data  permissions  and  access 

•  Performance :  communication  and  traffic  flow 

•  Scale :  amount  of  data 

•  Ease  of  Installation:  timeliness  of  submission 

•  Ease  of  Use :  distributed  and  timely  access 

•  Ease  of  Integration:  interoperability 

•  Cost  of  Ownership:  money  and  effort 

These  methods  are  similar  to  the  QoS/QoI  fusion  standard 
metrics  such  as  timeliness,  accuracy,  confidence, 
throughput,  and  cost;  [26]  with  most  of  the  efforts  in  JDM 
focusing  on  throughput  and  timeliness. 

Data  maintenance  is  akin  to  equipment  maintenance.  In 
the  case  that  equipment  maintenance  includes  reliability, 
survivability,  reparability,  supportability,  and  other 
“ilities”;  the  same  case  can  be  made  for  data. 

(1)  Reliability  is  that  the  data  is  available  and  timely 
which  requires  data  storage,  access,  and  retrieving 
methods.  Data  and  information  requires  accurate 
updating.  For  example,  acoustic  data  can  be  exploited 
for  a  target  ID  and  saved  in  a  target  folder.  However,  if 
later,  it  was  determined  from  HUMINT  reports  that  it 
was  a  benign  target  or  incorrectly  labeled,  the  data 
(acoustic)  and  information  (target  ID)  should  be 
updated  for  the  new  confidence  (target  ID)  and 
timeliness  (where  the  target  is  at  a  certain  time). 
Finally,  the  incorrect  information  needs  to  be  removed 
from  the  target  folder. 

(2)  Survivability.  The  data  needs  to  be  correlated  with 
the  pedigree  of  the  data  collection  and  decision  making 
processing.  To  ensure  data  availability,  it  needs  to 
“survive”  in  the  data  base  from  which  it  is  correctly 
called  when  needed.  Note,  as  more  data  is  stored,  older 
data  can  get  lost  as  things  scale. 

(3)  Supportability:  One  question  is:  Does  the  current  data 
need  various  updates  for  hardware  changes?  If  we  are 
conducting  data  management,  that  also  prioritizes 
archival  management  over  various  hardware  changes. 
Likewise,  software  changes  affect  access  to/from  the 
data.  Many  times,  data  is  stored  with  protocols  and 
header  files  to  be  access  by  application  and 
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presentation  architecture  layers.  When  there  is  tight 
coupling  between  these  layers  and  the  data  layer, 
access  to  the  data  may  be  affected.  Maintaining 
compatibility  software  grand- fathering  and  other 
methods  of  ensuring  backward  compatibility  are 
needed.  Furthermore,  one  can  think  of  future  or 
emergent  compatibility  needs.  Supportability  could  be 
maintained  with  standards  and  governance  that  are 
common  (such  as  that  for  all  the  services)  to  support 
JDM  and  D2D. 

4  Example/Simulation 

Our  example  is  based  on  three  criteria:  (1)  real-world 
MOVINT  scenario,  (2)  unstructured  data  (e.g.  time  series 
data  without  a  prescribed  model)  and  (3)  context  which 
aids  in  decision-making.  Seismic  and  acoustic  data  is 
being  collected  from  many  sensors  for  three  targets 
moving  on  a  road.  The  road  provides  context  to  develop  a 
model  from  the  unstructured  data  to  provide  MOVINT. 
The  real-world  SensIT  collection  provides  guidance  for 
future  collections  to  highlight  JCM  D2D  technologies 
such  as  joint  reporting  for  hard  (physics-based)  and  soft 
(human  text-based)  fusion.  To  perform  the  data 
management  we  use  data  mining  [67]  techniques  such  as  a 
support  vector  machine  (SVM)  [68,  69]  to  process  the 
unstructured  data.  Through  analysis,  we  can  determine  the 
optimum  use  of  the  data  to  detect  a  moving  target. 

4.1  Data  Processing 

To  determine  methods  of  Joint  Data  Management,  we 
compare  two  cases  of  (1)  processing  the  data  separately 
and  (2)  jointly  processing  the  acoustic  and  seismic  results 
Figure  4(a)  shows  the  case  of  the  acoustic  results  for  a 
receiver  operator  curve  (ROC)  [70]. 


Figure  4.  (a)  Acoustic  and  (b)  Seismic  ROCs  for  3  targets. 

Figure  4(b)  demonstrates  the  results  for  the  seismic 
results.  Note  that  for  the  data  set,  the  seismic  results  have 
a  lower  probability  of  false  alarms  for  target  3  and  target 
2;  however,  target  2  exhibits  more  confusion. 

4.2  Joint  Data  Management 

Next  we  explore  the  case  of  the  joint  seismic  and  acoustic 
data  management  and  utilize  SVM  for  classification, 
shown  in  Figure  5.  The  key  is  the  false  alarm  reduction 
which  is  desired  by  users.  In  general,  the  joint  analysis 
supports  better  decision  making  as  detection  was 
improved  for  a  constant  false  alarm  rate,  accuracy  was 
improved  as  to  the  target  location  from  joint  spatial 


measurements,  and  timeliness  in  decision  making  as  fewer 
measurements  were  needed  to  confirm  the  target  ID  (i.e. 
decision  made  with  two  modalities  required  fewer 
measurements  than  that  of  a  single  modality). 


ROC  curve 


Figure  5.  Combined  Seismic  and  Acoustic  Results. 


4.3  Visual  Analytics  MOVINT  Display 

Visual  analytics  provide  methods  to  visualize  and  jointly 
manage  data  to  decisions  for  MOVINT  capabilities.  For 
the  operational  analysis,  we  can  provide  an  object  track 
presentation.  Here  we  present  the  salient  features  of  the 
MOVINT  classification  information.  Figure  6  presents  a 
short  history  of  the  acoustic  information  and  Figure  7 
shows  the  case  of  the  robust  features  for  analysis. 


Figure  6.  Acoustic  Feature  Analysis. 


Figure  7.  Feature  Discrimination  Plot. 

We  see  that  features  2-5  discriminate  target  3  (blue), 
while  features  6-7  discriminate  target  2  (red),  and  feature 


181 


8-12  are  for  target  1  (cyan).  From  these  plots,  a  user  can 
determine  not  only  the  object  location,  but  the  key 
MO  VINT  target  features  enabling  positive  target  ID. 

To  utilize  the  QoS/QoI  metrics  [26]  for  Value  of 
Information  (Vol)  [33],  we  determine  whether  the  sensor 
is  “useful”  in  decision-making  at  each  time  step.  Figure  8 
is  a  truncated  plot  of  the  Vol  metrics,  with  the  summary 
over  all  time  steps  plotted  with  the  Vols:  combined 
(seismic/acoustic)  sensor  =  0.782,  acoustic  =  0.684,  and 
seismic  =  0.664  over  all  three  targets. 
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We  present  second  example  of  JDM  for  D2D  in  our 
companion  paper  at  the  Fusion  11  conference  using  wide 
area  motion  imagery  (WAMI)  [71]. 

5  Conclusions 

We  have  explored  methods  for  Joint  Data  Management 
(JDM)  for  MO  VINT  data-to-decision  making.  We  utilize 
a  support  vector  machine  to  process  the  unstructured 
classification  data  as  well  as  the  structured  data  of  the 
target  location.  We  showed  that  the  JDM  approach 
reduces  the  false  alarms  for  enhanced  and  timely  decision 
making.  Next  steps  would  be  to  investigate  different 
classifiers  and  optimum  feature  vectors  to  improve  JDM 
performance.  JDM  Information  Quality,  Quality  of 
Service,  and  Value  of  Information  needs  can  be  linked  to 
other  sources  of  soft  data  (human  reports)  and  hard 
(physics-based  sensing)  [72]  to  update  situation  reporting. 
JDM  will  require  new  methods  in  database  management, 
information  management,  and  measures  of  effectiveness 
for  mission  support  that  support  the  Data  Information 
Fusion  Group  (DFIG)  Level  5  Fusion  [26,  73]. 
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